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Abstract 

We derive a lower bound on the smallest output entropy that can he achieved via vector quantization of a d- 
dimensional source with given expected rth-power distortion. Specialized to the one-dimensional case, and in the limit 
of vanishing distortion, this lower bound converges to the output entropy achieved hy a uniform quantizer, thereby 
recovering the result hy Gish and Pierce that uniform quantizers are asymptotically optimal as the allowed distortion 
tends to zero. Our lower bound holds for all d-dimensional memoryless sources having finite differential entropy and 
whose integer part has finite entropy. In contrast to Gish and Pierce, we do not require any additional constraints on 
the continuity or decay of the source prohahility density function. For one-dimensional sources, the derivation of the 
lower bound reveals a necessary condition for a sequence of quantizers to be asymptotically optimal as the allowed 
distortion tends to zero. This condition implies that any sequence of asymptotically-optimal almost-regular quantizers 
must converge to a uniform quantizer as the allowed distortion tends to zero. 


I. Introduction 

Suppose we wish to quantize a memoryless source with an rth-power distortion not larger than D. More 
specifically, suppose a source produces the sequence of independent and identically distributed, d-dimensional, 
real-valued vectors {X^, k G Z} according to the distribution Px and we employ a vector quantizer that produces 
a sequence of quantized symbols {X/j, k G Z} satisfying 

_ I 

lim — y E 

n—¥oo JT 

for some norm || • || and some exponent r > 0. (We use lim to denote the limit superior and lim to denote the limit 
inferior.) Rate-distortion theory states that if for every blocklength n and distortion constraint D we quantize the 
sequence of source vectors Xi,..., X„ to one of e"^ possible sequences of quantized symbols Xi,..., X„, then 


IXfc-X. 


< D 
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the smallest rate R (in nats per source symbol) for which there exists a vector quantizer satisfying (1) is given by 

[ 1 ] 

R{D) = inf /(X; X) (2) 

^X|X 

where the infimum is over all conditional distributions of X given X for which 

E[||X-X|rj < (3) 

and where the expectation in (3) is computed with respect to the joint distribution f^x^xix- Here and throughout 
this paper we omit the time indices where they are immaterial. The rate R{D) as a function of D is referred to as 
the rate-distortion function. 

While R(D) characterizes the rate of the best vector quantizer that quantizes the source with rth-power distortion 
not exceeding D, sometimes quantizing blocks of n source symbols may not be feasible, especially if n is large 
(which is typically required to achieve (2)). In this case, it might be more practical to quantize each source symbol 
separately using a vector quantizer, defined as a (deterministic) mapping q{-) from the source alphabet X to the 
(countable) reconstruction alphabet X. 

In this paper, we consider the symbol-wise quantization of d-dimensional source vectors. This setup is sufficiently 
general to comprise various problems of interest in high-resolution vector quantization. For example, it allows us to 
analyze the performance of quantization schemes that buffer d consecutive symbols of a one-dimensional memoryless 
source and then quantize them using a d-dimensional vector quantizer. Furthermore, the quantization of stationary 
sources with memory can be studied by combining the analysis of symbol-wise, d-dimensional quantization with a 
limiting argument where d —)■ oo. 

We define the rate of the vector quantizer as the entropy of the quantized source symbol X = (;(X). Thus, the 
smallest rate of a symbol-wise quantizer satisfying the distortion constraint D is given by 

RrAD) = m{H{q{X)) (4) 

9 (') 

where the infimum is over the set of quantizers q{-) satisfying (3). Since X determines the quantizer output gjX), 
we have H{q(X.)\X) = 0 and the rate RrA^) can be written in the same form as (2) but with Fx|x replaced by 

RrAD) = mfl{X;q{X)). (5) 

<]{■) 

Since X = q{X) corresponds to a deterministic Fx|X’ *'■ follows that Rr^s{D) > R{D). 

Any discrete memoryless source can be losslessly described by a variable-length code whose expected length 
is roughly the entropy of the source [2], [3]. Consequently, R^AD) is the smallest expected length of a vector 
quantization scheme that first quantizes each source symbol using a vector quantizer and then compresses the 
resulting sequence of quantized symbols using a lossless variable-length code. 

In this paper, we focus on the asymptotic rate-distortion tradeoff in the limit as the permitted distortion tends to 
zero. Specifically, we study the asymptotic excess rate with respect to the rate-distortion function defined as 

^r4=A^{RrAD)-R{D)]. 


(6) 
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For one-dimensional sources (d = 1) and quadratic distortion (r = 2), Gish and Pierce demonstrated that the excess 
rate is equal to [4] 

1 7r6 

1^2,1 = o log (7) 

2 D 

where log(-) denotes the natural logarithm. They further showed that this excess rate can be achieved by a uniform 
quantizer, hence the well-known result that “uniform quantizers are asymptotically optimal as the allowed distortion 
tends to zero.”^ For multi-dimensional sources, only bounds on are available. To obtain (7), Gish and Pierce 
[4] imposed constraints on the continuity and decay of the probability density function (pdf) of X. Furthermore, 
they merely provide an intuitive explanation of their converse result together with an outline of the proof—at the 
end of [4, Appendix II] they write “The complete proof is surprisingly long and will not be given here.” 

The result (7) is equivalent to a result by Zador [6], which concerns the asymptotic excess distortion with respect 
to the distortion-rate function as the rate tends to infinity. Indeed, let Dr^d{R) denote the minimum distortion 
achievable with a symbol-wise quantizer whose output has an entropy not exceeding R, i.e., 

i7,,d(i?) = infE[||X-g(X)f] (8) 

g(-) 

where the infimum is over the set of quantizers q{-) satisfying 77(^(X)) < R. Zador’s theorem states that 

lim e^^Dr,diR) = (9) 

R —¥00 

where br.d is a constant that only depends on r and d but not on the distribution of X. Zador did not evaluate 
the constant 5^ d, but he did provide upper and lower bounds on that become tight for large d. Furthermore, 
for one-dimensional sources and quadratic distortion, it can be shown that & 2 ,i = 1/12. Taking logarithms on both 
sides of (9), and replacing R o Rr^d{D) and Dr,d{R) ^ D, we thus obtain that 

R 2 AD) = h{X) + ^ log ^ i log 12 + or{1) (10) 

where Oij(l) denotes error terms that vanish as R tends to infinity. Furthermore, the rate-distortion function can be 
approximated as [7]-[9] 

R{D) = h{X) + ^ log ^ log(27re) -f od(1) (11) 

where od( 1) denotes error terms that vanish as D tends to zero. Hence, the equivalence of Zador’s theorem (9) 
and Gish and Pierce’s result (7) follows by applying (10) and (11) to (6). 

While Zador’s original proof of (9) was flawed, a rigorous proof for quadratic distortion was given by Gray, 
Linder, and Li by using a Langrangian formulation of variable-rate vector quantization [10]. Their proof follows 
Zador’s approach of 1) proving the result for sources with a uniform pdf on the unit cube; 2) extending it to 
piecewise constant pdfs on disjoint cubes of equal sides; 3) proving the result for a general pdf on a cube; and 4) 
proving the result for general pdfs. Gray et al. do not impose any constraints on the continuity or decay of the pdf 
of X, so their proof is more general than the proofs by Zador [6] and by Gish and Pierce [4]. 

’The fact that, in the high-resolution case, the expected quadratic distortion of uniform scalar quantization exceeds the least distortion 
achievable by any quantization scheme by a factor of only 7re/6 was already discovered by Koshelev in 1963. See [5] and references therein 
for more details. 
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In this paper, we derive a lower bound on Rr,d that recovers (7) for one-dimensional sources and quadratic 
distortion. In contrast to [10], our proof follows essentially along the lines outlined by Gish and Pierce [4]. We do 
not impose any constraints on the continuity or decay of the pdf of X, so our proof is as general as the proof by 
Gray et al., and it is more general than the proof by Gish and Pierce. 

For one-dimensional sources, the derivation of the lower bound reveals a necessary condition for a sequence 
of quantizers (parametrized by D) to achieve the asymptotic excess rate R^^i. We apply this condition to the 
family of almost-regular quantizers, which was introduced by Gyorgy and Linder in [11] and includes the uniform 
quantizers. Almost-regular quantizers are relevant because they achieve Dr^i{R) when r > 1 [11, Theorem 3]. 
Thus, for one-dimensional sources and rth-power distorion with r > 1, we can restrict ourselves to almost-regular 
quantizers without loss of optimality. The necessary condition implies that any sequence of almost-regular quantizers 
achieving Rr,i must converge to a uniform quantizer as Z? —?• 0. This suggests that asymptotically-optimal quantizers 
must essentially be uniform. 

The rest of this paper is organized as follows. Section II introduces the problem setup and presents the main 
result of this paper. Theorem 1. Section III provides a back-of-the-envelope derivation of Theorem 1 that serves as 
an outline for the proof. Section IV contains the complete proof of this theorem. Section V presents a necessary 
condition for a sequence of quantizers to achieve the asymptotic excess rate. Section VI assesses the tightness of the 
lower bound presented in Theorem 1 for multi-dimensional sources by numerically comparing it to several upper 
bounds achievable by lattice quantizers. Section VII concludes the paper with a summary and discussion of the 
results. 


II. Problem Setup and Main Result 

We consider a d-dimensional, real-valued source X with support <T C whose distribution is absolutely 
continuous with respect to the Lebesgue measure, and we denote its pdf by /x. We require the source to satisfy 
the following two conditions: 

Cl X i-> /x(x) log/x(x) is integrable, ensuring that the differential entropy 

/i(X) = - / /x(x)log/x(x)dx (12) 

JX 

is well-dehned and finite; 

C2 the integer part of the source X has hnite entropy, i.e., 

iT(LXJ)<oo. (13) 

Here [aj, a = (ai,..., a^) G denotes the element-wise floor function, i.e., [aj = ([aiJ,..., [odj) where 
\_ai\, i = 1,... ,d denotes the largest integer not larger than ag. 

Condition C2 requires that quantizing the source with a cubic lattice quantizer of unit-volume cells gives rise to 
a discrete random variable of hnite entropy. This is necessary for the asymptotic excess rate R^^d to be well-dehned. 
Indeed, as demonstrated in [9], if iZ([XJ) = oo then the rate-distortion function R{D) is inhnite for any hnite 
D. Since Rr_d{D) > R{D), this implies that in this case R^^siD) — R{D) is of the form oo — oo. Fortunately, 



5 


Condition C2 is very mild. For example, by generalizing [12, Proposition 1] to the vector case, it can be shown 
that it is satisfied if E[log(l + ||X||)] < oo. This in turn is true, for example, for sources for which E[||X||“] < oo 
for some a > 0. 

The quantity iF([XJ) is intimately related with the Renyi information dimension defined in [13]; see also [12], 
[14]. Indeed, it can be shown that a source vector has finite Renyi information dimension if, and only if, (13) is 
satisfied [12, Proposition 1]. 

The quantizer is characterized by the (Borel measurable) function q: X —>■ X for some countable reconstruction 
alphabet <T C Equivalently, we characterize qf) by the quantization regions Si, i G Z and corresponding 
reconstruction values x^, i G Z. Specifically, Si, i G Z are disjoint (Borel measurable) subsets of that together 
with the reconstruction values Sci, i G Z satisfy 


u^. 

i 

= A’ 

(14a) 

7(x) 

= Xil {x G 5i} , for X G X 

(14b) 


I 


where 1 { •} denotes the indicator function. To simplify notation, we denote the Lebesgue measure of the quantization 
region Si by and the probability of X being in Si by pi. 

The main result of this paper is a lower bound on the excess rate for general r and d. For one-dimensional 
sources and quadratic distortion, it recovers the excess rate (7) by Gish and Pierce. However, in contrast to Gish and 
Pierce’s result, our bound does not require any continuity or decay conditions on the behavior of the source pdf—it 
holds for all source vectors having a pdf, having finite differential entropy, and having finite Renyi information 
dimension. 

Theorem 1 (Main Result): Let the source vector X have a pdf, and assume that ft.(X) and i7([XJ) are finite. 
Then, the excess rate as defined in (6), is lower-bounded by 


Rr.d > - log 
r 


r(l -f 

1 -I- d/r 


(15) 


where r( ) denotes the Gamma function. 

Proof: See Section IV. ■ 

In the one-dimensional case, (15) becomes 


Rr.l > - log 

r 


I (T{l + l/rYe 


(16) 


1 + l/r 

As we shall see next, (16) can be achieved by a uniform quantizer, so in the one-dimensional case the lower bound 
(15) is tight. Furthermore, for quadratic distortion, (16) is equal to l/21og(7re/6), hence it recovers the excess rate 
obtained by Gish and Pierce. 

To demonstrate the tightness of (15) in the one-dimensional case, and to assess the accuracy of (15) in higher¬ 
dimensional cases, we consider an upper bound on the excess rate that follows by restricting ourselves to the class 
of tessellating quantizers. A polytope V is tessellating if there exists a partition of consisting of translated and/or 
rotated copies of 7^; a tessellating quantizer, denoted hy q-p: X —>■ X, is a quantizer whose quantization regions 
Si are translated and/or rotated copies of a tessellating convex polytope V and the corresponding reconstruction 
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values Xi are the centroids of Si. A special case of a tessellating quantizer is a lattice quantizer, i.e., a quantizer 
whose quantization regions are the Voronoi cells of a d-dimensional lattice. Note that in the one-dimensional case 
the only convex polytope is the interval, so in this case the tessellating quantizer is the uniform quantizer. For the 
class of tessellating quantizers, Linder and Zeger [15] derived an asymptotic expression equivalent to (9). 

Theorem 2 (Linder and Zeger [15, Theorem 1 ]): Let the source vector X have a pdf, and assume that /i(X) and 
iF([XJ) are finite. Then, a tessellating quantizer q-p{-) with rth-power distortion E[|jX — ( 7 p(X)||’'] = D and rate 
R'p{D) = iL((773(X)) satisfies 


DIO 

where £{V) denotes the normalized r-th moment of V, defined as 

£(V) = 


A /p llx-xfdx 


Y('PY+r/d 


(17) 


(18) 


and V{V) denotes the volume of V. 

Remark: To be precise, [15, Theorem 1] requires that < oo for some a > 0 rather than (13), 

i.e., iL([XJ) < oo. (Here, Va = {x G x/a G V} denotes the polytope V rescaled by a.) Nevertheless, its 
proof hinges on a lemma by Csiszar (cf. [15, Lemma 2]), which also applies if the condition 77(X)) < oo is 
replaced by (13). Specifically, by setting in [15, Lemma 2] the partition Bq = {Bi, 7?2,...} of to be the set of 
d-dimensional cubes of unit-volume with the lower-most cornerpoint located at coordinates i G this partition 
satisfies the lemma’s conditions provided that (13) holds. 

Taking logarithms on both sides of (17), we obtain 


Rv{D) = h(X) + ^ log 4 ^ logf(iP) + od{1). (19) 

r D r 

Since a tessellating quantizer with rth-power distortion D satisfies (3), the rate R-p{D) upper-bounds Rr,d{D). 
Furthermore, the rate-distortion function 77(79) can be lower-bounded as [16] 


77(79) > h(X) + ^ log i - ^ log (I4r(l + d/r))’'/^e) (20) 

where Vd denotes the volume of the unit ball {x G ||x|| < 1}. The right-hand side (RHS) of (20) is referred 
to as Shannon lower bound. It has been demonstrated that its difference to 77(79) vanishes as 79 tends to zero, 
provided that the source distribution satisfies certain conditions; see, e.g., [7]-[9]. A finite-blocklength refinement 
of this bound can be found in [17], [18]. Recently, it has been demonstrated that for sources with finite differential 
entropy the Shannon lower bound is asymptotically tight if, and only if, 77([XJ) is finite [9]. Thus, we have 

77(79) = h(X) + ^ log ^ ^ log + d/r)Y^%) + 0,^(1) (21) 

for the class of sources considered in this paper. 

Combining (19) with (21), we obtain 


lim{77p(79)- 77(79)} = ^ log (Cdr(l + (7/r))"/''e) + ^ log7(lP). 
Recalling that R^^diD) < R'p{D) for every 79, this yields 

K,d < ^ log (^(l^dr(l + d/r)y^'^ej + inf ^ log7(T’) 


( 22 ) 


( 23 ) 
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where the infimum is over all d-dimensional, tessellating, convex polytopes V. 

Using that in the one-dimensional case the only convex poly tope is the interval, and noting that the interval has 
the normalized r-th moment 


m 


1 

2^{l + r) 


(24) 


the upper bound (23) becomes in this case 


R 


r.l < - log 

r 


1 fr{l + l/rYe 


(25) 


1 -I- 1/r 

which coincides with (16). Thus, in the one-dimensional case a tessellating quantizer (which in this case is the 
uniform quantizer) is asymptotically optimal. 


III. Derivation for One-Dimensional Sources and Certain Quantizers 

Before proving Theorem 1, we provide a simplified derivation of the lower bound (15) for one-dimensional 
sources (d = 1) and quadratic distortion (r = 2) that will serve as an outline for the complete proof of Theorem 1 
given in Section IV. Particularized to this setting. Theorem 1 becomes 

1 7TG 

R2.i>;7log—. (26) 

2 6 

In our derivation we shall only consider quantizers satisfying 


sup sup(x — XiY < OiD, for some constant a. (27) 

i xGSi 

This simplifying assumption is, for example, satisfied by the uniform quantizer when Xi is the midpoint of Si and 
the cell length A vanishes proportionally to y/D. However, it is prima facie unclear whether (27) holds without 
loss of optimality for general sources. 

By (5), we have 

R 2 i{D) = inf/(X;X) = h{X) - snp h{X\X). (28) 

«(•) ,(•) 

We upper-bound h{X\X) by using that, conditioned on A = Xi, the support of X is Si, so a uniform distribution 
over Si maximizes the differential entropy [2, Theorem 11.1.1]: 


h{X\X = xY < log Ai. 


Averaging over X then yields 

R 2 ,i{D) > h{X) - sup log A,. 


By Jensen’s inequality, this can be further lower-bounded by 


R2,i{D) > h{X) 



Together with (11), this yields 


— RiD)} > lim < - logZJ -f - log(27re) 
DiO DiO I 2 2 



(29) 


(30) 


(31) 


( 32 ) 
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In order to prove (26), it remains to show that, for any sequence of quantizers (parametrized by D), 

(33) 

i 

Then the RHS of (32) is lower-bounded by l/21og(7re/6) and we obtain (26) upon noting that the left-hand side 
(LHS) of (32) is equal to R 2 ,i- Hence we recover Theorem 1 for one-dimensional sources and quadratic distortion. 
The upper bound (33) follows along the lines of the proof of [15, Lemma 1]. We first express E (26 — X)"^ as 


; {X - = X! / fxix){x - Xi)"^ dx 

^ Si 


= I (x-x^fdx. 

Si L ^ 


We next note that the region Si of measure that minimizes {x — Xi)"^ dx is the interval Xi — ^,Xi + ^ , 


— / {x — x)'^ dx > 

J Si 


The first term on the RHS of (34) can therefore be lower-bounded by 




To evaluate the second term on the RHS of (34), we introduce the piecewise-constant pdf 


/^\x) = {x e S^} , xG 


With this, we can upper-bound the second term on the RHS of (34) as 


/ ^ 
Is. [a. 


fx{x) (x - x*)^dx = V / f^\x)-fxix) {x-x^fdx 

J , 45, L J 

< y f^\x) - fx{x) dx 


since, by (27), we have supj sup^.^^. (x — x^)^ < aD. 

By Lebesgue’s differentiation theorem, converges to fx almost everywhere as supj —> 0. It therefore 

follows from Scheffe’s Lemma [19, Theorem 16.12] that 


Combining (36) and (38) with (34), and using that E 


Dio / dx = 0. 

I using that E (A — X)^ < 72, w 


< D, we obtain 


< 1272 y -f ay /^\x) - fx{x) dx^ . 


Together with (39) this proves (33). 
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IV. Proof of Theorem 1 

A. Variational Entropy Inequality and Auxiliary Results 

The above back-of-the-envelope derivation directly generalizes to multi-dimensional sources and rth-power 
distortion. In order to prove Theorem 1, it would remain to show that (27) holds without loss of optimality. 
Unfortunately, for general sources this appears to be a difficult task. Indeed, the quantization regions of the optimal 
quantizer are difficult to characterize since the optimal quantizer (and hence the number of quantization regions 
together with their locations and volumes) changes with D. To sidestep this problem, we replace (29) by an upper 
bound on ft,(X|X = x^) that is based on the following variational bound on differential entropy. 

Lemma 3: Let / and g be arbitrary pdfs. If — / f{x) log f{x) da; is finite, then — J f{x) log (/(x) dx exists and 

- y /(a;) log f{x) dx < - J f{x) log g{x) dx (41) 


with equality if, and only if, f{x) = g{x) almost everywhere. 

Proof: See [20, Lemma 8.3.1]. ■ 

The inequality (41) is a direct consequence of the information inequality. Lemma 3 is also reminiscent of [21, 
Theorem 5.1], which provides an upper bound on the mutual information between a channel input X and a channel 
output Y and holds for general random variables. In fact, when L is a real-valued random variable and the conditional 
distribution of Y given X is absolutely continuous with respect to the Lebesgue measure, then [21, Theorem 5.1] 
essentially provides an upper bound on h{Y) that is of the form (41). 

Lemma 3 allows us to upper-bound differential entropy by replacing the true pdf / inside the logarithm by an 
auxiliary pdf g. In order to upper-bound the conditional differential entropy /i(X|X = x^), we apply Lemma 3 



where 


= {x G 5*: ||x-Xi|| < e}, (43a) 

= {x G 5*: ||x-Xi|| > e}, (43b) 

A . x f ii>=-*iir 

-f J_ e 05 dx, (43c) 

Ai e denotes the Lebesgue measure of .8^ e, and 5 and e are parameters to be specified later. 

This conditional pdf of X given X is uniform on a set of measure A^ ^ around x^ and then decays exponentially. 
Intuitively, if decays more slowly than as D tends to zero, then with high probability X lies in and the 
upper bound obtained from Lemma 3 is essentially equivalent to (29) but with replaced by A^ g. Our choice of 
ffx|x ^ ^ allows us to control the contribution of x’s lying outside of Bi^^- We next need to show that 

i 

which corresponds to (33) generalized to arbitrary d and r, but with A^ replaced by A^ By construction of 
we have that sup^sup^gg. ||x — Xi||'’ < e’’, so Bi^e satisfies (27) upon choosing e’’ = D/k (for some constant 
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k). The claim (44) follows therefore immediately from the steps (34)-(40). Thus, by using Lemma 3 together with 
(42), we can replace (whose behavior as a function of D is unknown) by g (whose behavior can be controlled 
by cleverly choosing e). 

Before we set out to prove Theorem 1, we first provide a number of auxiliary results that we shall need throughout 
the proof. The proof of Theorem 1 is then given in Section IV-B. 

Lemma 4: The normalizing constant g is upper-bounded by 

K.,g < A,,g + dVdD^/'-V + (45) 

where r(-, •) denotes the upper incomplete Gamma function. 

Proof: The first inequality in (45) follows from the definition of g (43c) and by upper-bounding the integral 
on the RHS of (43c). Indeed, since S^^g C {x € ||x — x^lj > e}. 



e dx < -yr- / e dx 

" i||x-x.||>e 

= dVdD'^/'' [ 


(46) 


where the second step follows by writing x — x^ in polar coordinates and by using that the surface area of the 
d-dimensional ball of radius p = |jx — Xi|| is dVdp‘^~^ (see, e.g., [16, Eq. (10)]), and the third step follows by the 
change of variable ^ = p^/{DS). 

The second inequality in (45) follows by upper-bounding (see, e.g., [16, Eq. (7)]) 


Ai,g < [ dx = e'^Vd 

(47) 

J |x—x||<e 


and T{d/r,x) < T{d/r), x > 0. 

■ 

Lemma 5: The set ,8^ g satisfies 


^Pr(Xe8..g)< ^ 

i 

(48a) 

^E[||X-x,|ri{XG8,,g}] <D. 

1 

(48b) 

Proof: We first prove (48a). By the distortion constraint (3), and since L Si and 

X —Xi > e for X e 8j_g, 

we have 


D> V /" /x(x)||x-x,||’'dx 


/x(x)|lx-Xi|rdx 

, 7g..g 


- X! / ./x(x)e’'dx. 

* 7g., 

(49) 


Using that e neither depends on i nor on x, (48a) follows by diving both sides of (49) by e’’. 



11 


To prove (48b) we use again the distortion constraint (3) and that C Si to obtain 


^E[||X-x,|ri{XeS..4] = ^ / /x(x)|lx-x,|rdx 

i i JBi. 


< E 

< D. 


IX-XII 


(50) 


B. Proof of Theorem 1 

Expanding I{'X.;'X.) as /i(X) — /i(X|X), we obtain from (5) and (21) that the excess rate can be expressed as 

/ \ 




lim < - logH + - log f 3 (Vdr(l + d/r)Y^‘^e) - sup/i(X|X) > . 

Dio I r Vd' ' / g(.) I 


To derive the lower bound (15) given in Theorem 1, it remains to show that 


lim sup/i(X|X) — - log I? [ — “ 

1 *?(■) ^ ^ 


d. 

r 


(51) 


(52) 


To this end, we upper-bound the conditional differential entropy /i(X|X) using Lemma 3 together with (42). This 
yields for every X = x^ 


h(X\± = X,) < logK,,, - E [log 1 {X e B,,,} 


< log (a.„ + dV.D'V^r (^. 1^) ) + liog (;^) I Pri 

+ ^E[||X-x,|ri{XeSg,4| Xe5g] 


Xe5g 

X e Sg,, I X G 


(53) 


where the second inequality follows from the bound on e presented in Lemma 4 and by upper-bounding 
— log(r/(5‘^/’') < |log(r/(5‘^/’’)|. Averaging over X then yields 

h{X\X) < J^PYog + dVdD^/^T (4 

+ |log (^) I E P>-(X e B.,e) + E E [l|X - Xgfl {X G J] . (54) 


By Lemma 5, this can be further upper-bounded by 


/i(X|X) < log ( Ag,, + ( ^, ^ ) ) + 




D 1 

- 1 —; • 

6 


We next choose 


K 


(55) 


(56) 


for some k > 0 that we will let tend to zero at the end of the proof. Lor ease of exposition, we do not always 
make this choice explicit in the notation but write or D/k depending on which is more convenient. 

With this choice, the second term on the RHS of (55) becomes k |log(r/d'^A)| Xo evaluate the first term on the 
RHS of (55), we express pi as 

K = Pr(X G B,Y + Pr(X G B,Y (57) 
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and define 


By Lemma 5, we have 


p,4^Pr(XGS,,,). 


I 


p£ < K 


(58) 


(59) 


which vanishes as we let k tend to zero. With the above definition, and applying the second inequality in (45) 
(Lemma 4), we obtain for the first term on the RHS of (55) that 

= Pr(X G log Pr(X G B,,,) log + dVdD^/^T ) 

< ^ Pr(X G S.,£) log (^A,,, + dVdD<^I^Y + p, log + dVdD’^l'^Y{dlT^ . (60) 

Using (59) and that ^ - Pr(X G Bi^f) + pe = 1, (60) becomes 




A,- 


d 1 


< 5: Pr(X G S,£) log ( ^ + dU,r (-,-))+ p, log ( ^ + dVdT{d/r) )+ - log 79 


< -^Pr(XGS,.,)log 


A,; 


£)d/r 


dVdV 


k6 


d 1 
r ’ kS 


Vd 


r/d 


Ud 




By Jensen’s inequality, the first term on the RHS of (61) is upper-bounded by 


-^Pr(XGS..£)log 


A. 


+ dVdV ( - 


Dd/ 


d 1 
kS 


rjd 


<(l-p£)-log z -VPr(XGSg,£) 

r \ 1 - pf ^ 


Pe 


Ai 

dy 




1 r/d\ 


(62) 


For r/d < 1, we have (x + -I- a^!'^ for every x,a > 0; for r/d> 1, the function x i-G -I- 

is concave for every a > 0. Consequently, 


— ^Prfxee, 


1 - p, 


A,; 


£)d/r 


dVdT ( - A 
r KO 

r/d 


r/d 


^ Eg Pr(X G Sg,)% + , r/d<l 


< < 


r/d \ d/r 


^EgPr(XGSg,£)^ +dVdV{i,h) 


rjd 


(63) 


r/d > 1 


where the upper bound for r/d > 1 follows from Jensen’s inequality. 

We next generalize (33), namely, 

lim ^ < 12 (64) 

D/O D ^ 

i 

to the d-dimensional sets of Lebesgue measure A^ g. To this end, we follow essentially the steps (34)-(40) 
in Section III with S/ replaced by and with replaced by A^ g. However, (39) is based on Lebesgue’s 
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differentiation theorem, which requires that the families of sets (parametrized by D) have bounded eccentricity} 
Since is the intersection of Si with the d-dimensional ball of radius e centered at x^, cf. (43a), and since Si 
is arbitrary, the sets ,8^ ^ may not fulfill this condition. In the one-dimensional case, a sufficient condition for 8^ g 
having bounded eccentricity would be that, for every distortion D, the quantization regions Si are convex. This in 
turn can be assumed without loss of optimality, e.g., for quadratic distortion and sources with well-behaved pdfs 
[11]. However, for one-dimensional sources with general pdfs, or for higher-dimensional sources, assuming convex 
quantization regions may be too restrictive. Fortunately, the families of sets Bi^e that have not bounded eccentricity 
can be disregarded without affecting the final result. The inequality (33) can therefore be generalized to the case at 
hand without imposing any additional constraints on the quantization regions Si, i G Z or the source pdf /x. The 
result is stated in the following lemma. 

Lemma 6: Let the sets Bi^e, i G Zhe defined in (43a), and let g, i G Z denote the Lebesgue measures of these 
sets. Assume that e’’ = D/k. Then, for every k > 0, 


<5 < r;'-' (i +1). 


■fd 


DIO 

Proof: See Appendix A. 

Combining Lemma 6 with (55)-(63), and bounding 0 < pg < k, we obtain that 


(65) 


lim ^ 

DIO 


|»P/.(X|X) - ^log d| < ;?log (^,py'J 

+ Klog + dVdT(d/r)^ + k 




-I- for r/c? < 1 
0 


(66a) 


and 


lim I sup /i(X|X) - - log D i < log 
DiO ,(.) r J 


y^i^^r/dr\dVdT(^-,i, 


(1 - 


ttlog 


Vd 

^dfr 


dVdT{d/r) ) + K 




r k6 


-I- -r, for r/d > 1. 
0 


(66b) 


Using that lim 5 _).oo T{d/r,^) = 0 and lim^_j.o^ log(a/^'^/’’ -f /3) = 0 (for any a,P > 0), letting k —)• 0 yields 


lim sup/i(X|X) - -logD'^ < y log(Uj^‘‘(l -f r/d)) + 

1 Qi') ^ ^ 


This in turn proves (52) upon letting S —t oo and concludes the proof of Theorem 1. 


(67) 


V. Asymptotically Optimal Quantizers 

As mentioned at the end of Section II, in the one-dimensional case uniform quantizers with cells of length 
2(1 + achieve the asymptotic excess rate Hence, uniform quantizers are asymptotically optimal 

as the allowed distortion tends to zero. One may wonder whether every sequence of quantizers achieving R^.i 
must converge to a uniform quantizer as O —> 0, or whether uniform quantizers are merely a convenient choice 

^A family T of sets is said to have bounded eccentricity if there exists a constant c > 0 such that for every S G T the Lebesgue measure 
of S is not smaller than c times the volume of the smallest ball containing S. 
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and other quantizers with vanishing cells are also asymptotically optimal. In this section, we partially address this 
question by presenting in Theorem 7 a necessary condition for the asymptotic optimality of a sequence of quantizers 
(parametrized by D). We then apply this condition to the family of almost-regular quantizers. 

Theorem 7: Suppose the sequence of quantizers q{-) (parametrized by D) with quantization regions Si, i G Z 
satisfying the distortion constraint E[|7f — g(X)|’'] < D achieves the asymptotic excess distortion 

T(1 + l/rfe^ 


lim{i7(g(X)) - R{D)] = - log 
Dio r V i 


1/r 


Then, 


lim limy^PriX € 5^)1 

p->oo ^ 


A’ 




D 


-2’’(l+r) 


< t? > = 1, for every > 0. 


( 68 ) 


(69) 


Here, Aj p£,\/T denotes the Lebesgue measure of in (43a) for e = . 

Proof: This result is a direct consequence of Jensen’s inequality applied in (62) in the proof of Theorem 1. 
See Appendix B for a detailed proof. ■ 

If we interpret the quantizer as a random variable that takes on the value Si with probability Pr(Ar € Si), then 
Theorem 7 can be paraphrased as follows: “A sequence of quantizer achieves the asymptotic excess distortion R^.i 
only if Aj p£)i/r converges in probability to 2(1 + as —> 0 and p —> oo.” 

A quantizer q{-) is said to be almost regular if there exists a set 5 C A" of Lebesgue measure zero such that on 
X\S the quantization regions are intervals containing the reconstruction value [11]. (For all x G S, we can define 
q{x) in an arbitrary manner without changing the entropy and distortion of <?(•).) In other words, an almost-regular 
quantizer qf) can be written as 


q{x) = tti < X < bi} , for X € A’ \ 5 

I 

q{x) = E Xi\ {x € 5i} , for X € 5 


(70a) 

(70b) 


where Oi < Ci < bi, and where Xi and Si are arbitrary. 

For almost-regular quantizers, condition (69) in Theorem 7 can be simplified as follows. Firstly, since the source 
has a pdf and S has measure zero, 

EPr(^€‘5in5)=0. (71) 


Secondly, for any quantization region [oi, bi) C X \ S and reconstruction value Ci G [ai,bi), we have 


where Ai = bi — Ui. Consequently, 

ia: 


1 




D 


-2’'(l + r) 


< r? ^ = 1 


D 


-2'-(l + r) 


<tJ , p>(2'-(l+r)) 


1/r 


(72) 


(73) 


We thus have the following result: 

Corollary 8: Suppose the sequence of almost-regular quantizers q{-) (parametrized by D) with quantization 
regions Si, i G Z satisfying the distortion constraint E[|Ar — (/(AT)!'’] < D achieves the asymptotic excess distortion 


lim {i7(g(X)) -77(79)} = ^ log + ^ 

DIO \ i -h i/r 


(74) 
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Then, 


lim Pv(X G 
DIO^ 


S^)l 


§-2’-(l + r) 


<'&> = h 


for every i9 > 0. 


(75) 


Here, denotes the Lebesgue measure of Si. 

Again, interpreting the quantizer as a random variable that takes on the value Si with probability Pr(A G Si), 
Corollary 8 can be paraphrased as “any sequence of almost-regular quantizers achieving Rr,i must converge in 
probability to a uniform quantizer as D —> 0.” 


VI. Balls versus Tessellating Polytopes 


The lower bound (15) on the excess rate presented in Theorem 1 hinges on the fact that the distortion over the 
quantization region Si, i.e., |jx — x||’’dx, is lower-bounded by the distortion over a ball around Xi with the 

same volume (cf. (84) in the proof of Theorem 1 with Bi ,, replaced by Si and with ^ replaced by A^). Since 
the one-dimensional ball is an interval and, hence, tessellates M, it follows that for scalar sources the lower bound 
(15) is achieved by a tessellating quantizer, so in this case it is tight. However, it is expected that this is no longer 
true for multi-dimensional sources, since in general balls do not tessellate the space. In fact, it is unclear whether 
there exists any (possibly non-tessellating) vector quantizer that achieves (15) for multi-dimensional sources. 

To assess the tightness of the obtained lower bound, we compare it numerically with the excess rates achievable 
by several lattice quantizers. To this end, we use Linder and Zeger’s upper bound for tessellating quantizers (23) 
together with the normalized second moments £{V) of various lattice quantizers tabulated in [22, Table I]. In 
order to better compare our results with previous works, in this section we consider the excess rate per dimension, 
defined as R^.d — ^r,d/d. The excess rate per dimension is relevant, for example, in the analysis of quantization 
schemes that buffer d consecutive symbols of a one-dimensional memoryless source and then quantize them using 
a d-dimensional vector quantizer. 

For the sake of simplicity, we only consider quadratic distortion and the Euclidean norm. In this case, the lower 
bound (15) becomes 




Furthermore, the upper bound corresponding to tessellating quantizers (23) becomes 


(76) 


1^2.d < ^ log f27re^ inf f('P) 


(77) 


Another upper bound on R 2 ,d follows from an upper bound on 6r,d in Zador’s theorem (9) that was presented in 
[6]. This upper bound is based on random coding arguments and yields for quadratic distortion and the Euclidean 


The bound (78) demonstrates that R 2 d vanishes as d tends to infinity. This is perhaps not very surprising, since the 
rate-distortion function R{D) is essentially achieved by a vector quantizer whose dimension tends to infinity. 

In Figure 1, we depict the bounds (76) and (78) as a function of the dimension d. We further show several 
achievability results based on lattice quantizers (77). The normalized second moments ({V) corresponding to 
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Figure 1: Bounds on the excess rate per dimension R 2 ,d (in bits per source dimension) of a c?-dimensional 
vector quantizer. The excess rate per dimension attained by lattice quantizers was obtained by applying to (77) 
the normalized second moments tabulated in [22, Table I]. 

these lattice quantizers were tabulated by Conway and Sloane in [22, Table I]. In fact. Figure 1 is equivalent 
to [22, Figure 1] with the only difference that here we plot the excess rate per dimension whereas Conway and 
Sloane plot the normalized second moment. Specifically, we include the excess rates per dimension incurred by 
a (one-dimensional) uniform quantizer, by a (two-dimensional) hexagonal quantizer, and by the three-dimensional 
tessellating quantizer whose regions are cuboctahedrons. These quantizers correspond to the so-called Voronoi 
lattices of the first type (the integers), A 2 (the two-dimensional hexagonal lattice), and A^ (the body-centered 
cubic lattice). For d > 3, we further include the excess rates per dimension attained by the D’^ lattices. Labeled 
with cross markers, we show the excess rates per dimension corresponding to the lattices Eq, E^, the Cosset lattice 
Es, the Coxeter-Todd lattice K 12 , the Barnes-Wall lattice Aig, and the Leech lattice A 24 . We refer to [22] and 
references therein for further details. 

Finally, we compare the obtained bounds with a conjectured lower bound by Conway and Sloane [22, Eq. (4)] 
that follows by computing the distortion attained by a set of reconstruction points located at the vertices of a d- 
dimensional tetrahedron. Note that this bound was computed for fixed-rate quantizers, i.e., for quantizers that have 
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a finite number M of quantization regions and whose rate is defined as log M. While the excess rate achievable 
by a fixed-rate quantizer can also be achieved by an entropy-constrained quantizer, the converse is not necessarily 
true. It is thus prima facie unclear whether Conway and Sloane’s conjectured lower bound would also apply to 
entropy-constrained quantizers. Nevertheless, we decided to include it here since it is remarkably close to the excess 
rates per dimension corresponding to lattices Es and A 24 . 

As mentioned above, the excess rate per dimension vanishes as d tends to infinity. However, as illustrated by 
Figure 1, it decays slowly: for example, for a 10-dimensional vector quantizer we still have 


^2,10 > 2 


er(6)i/5^ 


6 


0.1196 bits per source dimension 


(79) 


which is, arguably, not much smaller than the excess rate per dimension of the (one-dimensional) uniform quantizer 

R 24 = - log 2 ( 7 re/ 6 ) « 0.2546 bits per source dimension. 


(Here log 2 (-) denotes the binary logarithm.) In general, the bounds on R 2 ,d given in (76) and (78) are of the order 
0 (log d/d). 

Observe that for multi-dimensional sources the gap between the lower bound (76) and the excess rate per 
dimension achievable with lattice quantizers is substantial. This gap is partly due to the fact that, in order to derive 
the lower bound (15), we lower-bounded the distortion over the quantization region Si by that over a ball with 
the same volume, cf. (84). To obtain a tighter lower bound, we may need a more accurate approximation of this 
distortion that, like the conjectured bound by Conway and Sloane, takes the geometry of the optimal quantization 
regions into account. 


VH. Conclusions 

The nonnegativity of relative entropy implies that the differential entropy of a random variable X with pdf / 
is upper-bounded by —E[logp(Ar)] for any arbitrary pdf g. Using this inequality with a cleverly chosen g, we 
derived a lower bound on the asymptotic excess rate of entropy-constrained scalar quantization. Specialized to the 
one-dimensional case and quadratic distortion, this bound coincides with the excess rate obtained by Gish and 
Pierce in [4], and by Gray et al. in [10] particularized for scalar quantizers. The proposed derivation thus recovers 
the well-known result that uniform quantizers are asymptotically optimal as the allowed distortion vanishes. 

Our result holds for any c?-dimensional memoryless source X that satisfies |fi(X)| < 00 and iT([XJ) < 00 . 
The presented proof is thus as general as the proof by Gray et al, and it is more general than the proof by Gish 
and Pierce. In fact, it has recently been shown that these conditions are necessary and sufficient for the Shannon 
lower bound to be asymptotically tight for vanishing distortion, and that H ([XJ) < 00 is a necessary and sufficient 
condition for the rate-distortion function to be finite [9]. Our result thus holds for the most general conditions that 
can be imposed in the analysis of high-resolution quantizers. 

The derivation of the lower bound reveals a necessary condition for a sequence of quantizers (parametrized by D) 
to achieve the asymptotic excess rate. Specifically, we demonstrated for scalar sources that the intersection of the 
quantization region Si with the interval [xi — pD^/fx, + must have a Lebesgue measure that converges in 
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probability to 2(1 + as H —> 0 and p —> oo. This implies that any sequence of almost-regular quantizers 

achieving the asymptotic excess rate must converge in probability to a uniform quantizer as I? —> 0. Since almost- 
regular quantizers achieve Dr^i{R) when r > 1, this in turn suggests that asymptotically-optimal quantizers must 
essentially be uniform. 

While the presented bound is tight for scalar sources, it is unclear whether the same is true for multi-dimensional 
sources. Indeed, its derivation hinges on the fact that the distortion over the quantization region Si is lower-bounded 
by the distortion over a ball around with the same volume, cf. (84). Since the one-dimensional ball is an interval 
and, hence, tessellates K, it follows that for one-dimensional sources the converse bound (15) is achieved by a 
tessellating quantizer (which in this case is the uniform quantizer). However, it is expected that this is no longer 
true for multi-dimensional sources, since in general balls do not tessellate the space. It is yet unclear whether 
there exists any (possibly non-tessellating) vector quantizer that achieves our converse bound for multi-dimensional 
sources. 


Appendix A 
Proof of Lemma 6 

To prove Lemma 6, we first fix an arbitrary constant 77 > 0 and divide the indices i according to whether 
Ai,e > ijVde'^ or not. Specifically, let 


Ai,, >pLde‘^} 


and divide the sum on the LHS of (65) into 


A 


r/d 


A: 


r/d 


A: 


r/d 


^Pr(X e e + E e 


D 


D 


(80) 


( 81 ) 


iex iex‘ 

where denotes the complement of I. For every i G 1“^ we have A^ g < rjVde’^, so the second sum on the RHS 
of (81) can be upper-bounded as 


a: 


id 


Y, Pr(X G E e 




e 




V 




id 


(82) 

where the second step follows because S' = D jn and because, by definition, the sets ,8^ g are disjoint, so the sum 
of the probabilities Pr(X G 8i^g) is equal to the probability of which is upper-bounded by 1. 

To upper-bound the first sum on the RHS of (81), we begin by lower-bounding E [||X — X|| 


IX-XII 


= E / /x(x)|lx-Xi|rdx 


'Si 


-E/ /x(x)|lx-i,|rdx 


iei' 


= EPr(X € /„ llx-x*|rdx 


iei 


-E 


Bi , 


—Pr(X G 8.,g) - /x(x) 


lx — Xjjl'’ dx. 


(83) 
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The region of volume Aj ^ that minimizes J„ |jx —x||’’dx is a ball around x. We thus have [16, Section III] 


X — xJ dx > 


d + r 


which yields for the first term on the RHS of (83) 


/ ||x-x,|rdx>;^Pr(XG,8,,,) 


y 7 "(l+r/d) 


Multiplying both sides of (83) by '^{l + r/d)/D, applying (85) to (83), and using that E ^jX — X||’'J < D, we 
obtain 

^Pr(XeS,,,)^<F;/"(l + ^) ^Pr(XeS.,.)-/x(x)] |ix-x,|rdxy (86) 


We next introduce the pdf 


.4^)(x; ^ ^Pr(X e {x e 


+ /x(x) {x G Si,e} + ^ l{x e , XG 


which allows us to write 


XI / ^Pr(X G S*,,) -/x(x) Ijx-x.fdx^X / /x^^(x;{^i,e}) -/x(x) ||x - x,||’'dx. (88) 

Since ||x — Xi||’’ < e’’ = D/k for x G i G I and /^^(x; — /x(x) = 0 otherwise, we have 

X / -/x(x)j ||x-Xifdx < — / /4^^(x;{B,,J) - /x(x) dx. (89) 

, ^ 5 . L J K J 

Combining this upper bound with (81), (82), and (86), we obtain 

XPr(X G < C;/" (l + ^) ("l + 11 |4 ^)(x;{B,,J) - /x(x)| dx) (90) 

i ^ ' 

We next show that, for every p > 0, 


limsup [ {Bi^e}) - /x(x) dx = 0. (91) 

£>io q(.) J 

(Note that depends on q{-) and D via £, i G Z.) It then follows that 

supXPr(X G B, ,)^ < y;/'' fl + ^) + (92) 

Dio q(.) ^ D ‘^ \ dJ K 

which proves Lemma 6 upon letting p tend to zero from above. 

It thus remains to prove (91). By definition, differs from /x only when x G Bi^e, ? G X. Since the family of 

sets Bi e, z G X (parametrized by D) has bounded eccentricity, it follows from Lebesgue’s differentiation theorem 

that converges to /x almost everywhere as D (and hence also e) tends to zero, which by Scheffe’s lemma 

then implies (91). However, compared to the standard setting under which Lebesgue’s differentiation theorem is 

proven, our setting is slightly more complicated, since as D tends to zero not only the diameters of the sets g 
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decay, but also their locations in may change. For completeness, we therefore provide all the steps, even though 
they follow closely the standard proof of the Lebesgue differentiation theorem. 

We first note that the integral in (91) is nonnegative and bounded, so its supremum is finite and for every v > Q 
there exists a sequence of quantizers (parametrized by D) such that 

/x^^(x;{S*,e}) -/x(x) dx^limsup/ /^^(x; -/x(x) dx-iy. (93) 

Dioj J 

Since i/ > 0 is arbitrary, it follows that, in order to prove (91), it suffices to show that for any sequence of quantizers 
(parametrized by D) 

lim J - /x(x) dx = 0. (94) 

Specifically, we shall show that for any sequence of quantizers (parametrized by D) 

A Qx e IS - /x(x)| > = 0, for every ^ > 0 (95) 

where A(-) denotes the Lebesgue measure on M'^. It then follows that converges to /x almost everywhere as 
D —>■ 0 since 

|x e hm - /x(x) > o| = Q |x e hm f^\x; {^i.e}) - /x(x) > (96) 

and the countable union of sets of measure zero has measure zero. By Scheffe’s lemma, almost everywhere 
convergence of to /x implies (94), which together with (93) proves the desired result (91). 

We thus set out to prove (95). By the definition of and the triangle inequality, 

4^)(x; - /x(x)| < ^ ^PJ-(X e B,,,) - /x(x) 1 {x G . (97) 

lei 

We next approximate ^^Pr(X G Bi^e) by replacing /x by a continuous function g. Indeed, since /x is integrable, 
for every e > 0 there exists a continuous function g such that [23, Theorem 2.4.14, p. 92] 

y Lfxlx)- 5 (x)|dx < e. (98) 


It then follows that, for every x G Bi^^, 

y-Pr(X G S*.,) -/x(x) < ( 5 (y)dy- 5 (x) 

2,6 2,6 t/ tSi e 

+ ^[ |./x(y)-ff(y)|dy+|/x(x)- 5 (x)|. (99) 

JBi^e 

Let B{c,p) = {x G ||x — c|| < p} denote the d-dimensional ball of radius p centered at c. Note that 
X[B{xi, e)) = Vd€‘^. For every x G Bi^e and i G X, the second term on the RHS of (99) can be upper-bounded by 



(100) 
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where (/x — gY denotes the Hardy-Littlewood maximal function for /x — g, i.e., 

(/x-ff)*(x) = sup-y—^ / |/x(y)- 5 (y)|dy, xeM^*. (lOi) 

p>o A(«(x,p)) 7 b(x.p)' 

In (100), we have used that, for every x G Bi^e and i € X, we have C B^ki, e) C B{x, 2e) and 

^^,e > r/A(S(x„e)) = 2-^A(S(x,2e)). 


Combining (99) and (100) with (97), we obtain 


- /x(? 


iei 


[ 5(y)dy-5(x) 


1 {x G 


+ E - 5)*W1 {x G S*,£} + y]] |/x(x) - g(x)| 1 {x G Si,e} 


iei 


iei 


^E 

iei 


/ <7(y)dy-5(x) 

Jb, . 


lL{x G Si,£} + —(/x- 5 )*W + |/x(x)- 5 (x)|, xGM"^ (102) 


since the sets ,8^ e, i G X are disjoint. The second and third term on the RHS of (102) are independent of D and 
q{ ). The first term on the RHS of (102) vanishes as D tends to zero for any sequence of quantizers. Indeed, the 
continuity of g implies that for every > 0 and x G there exists an eg > 0 such that 

l5(y)-5(x)| <t?, for |lx-y|! <2eo. (103) 

Since x, y G satisfy ||x — y|j < 2e, it follows that for every r? > 0 and x G there exists an eo > 0 such that 


^ / ff(y)dy-ff(x) 

^C,£ JBi e 


1 {x G {x G J . 


e < eo. 


(104) 


Using that the sets i G X are disjoint, we conclude that for every d > 0 and x G there exists an eo > 0 
such that 


E 

iei 


1 


A, 


i,£ JBi 


ff(y)dy- 5 (x) 


1 {x G < t9, e < eo. 


(105) 


Since > 0 is arbitrary and e vanishes as Z? —> 0, this implies that for every x G and any sequence of quantizers 


lim y 


iei 


^ / 5(y)dy-5(x) 

JBi e 


1{xg8,,J = 0. 


(106) 


We conclude the proof of Lemma 6 by applying (102) and (106) to upper-bound the Lebesgue measure on the 
LHS of (95). Indeed, we have 


A I <! x G : lim 


- /x(x) 


>2^ 


')d 


< A ( <i x G M'*: —(/x - ff)*(x) + |/x(x) - g(x)| > 2^ 
1 
■yd 


< 


aQxGM'^: ^(/x-ff)*(x) +A({xGM'^: |/x(x) - 5 (x)| > ^}) . (107) 
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The first term on the RHS of (107) can be upper-bounded by using the Hardy-Littlewood maximal inequality [24, 
Theorem 3.4, p. 55] 

aQxSM'': ^(/x-ff)*(x) ^ J l/x(x)-g(x)|dx (108) 

for some constant ad that only depends on d. Likewise, the second term on the RHS of (107) can be upper-bounded 
using Chebyshev’s inequality [23, Theorem 4.10.7, p. 192] 

AdxeM^': |/x(x)-g(x)| > d) < l,fx(x) - p(x)|dx. (109) 

Combining (108) and (109) with (98) and (107), it follows that 


X G 


lim 


/x^^(x; {Bi,e}) - /x(x) 


>2^ 


1 + 2'^ad/r] 


( 110 ) 


This proves (95) upon letting e tend to zero from above, which was the last step required to prove Lemma 6. 


Appendix B 
Proof of Theorem 7 

Following the steps (51)-(61) in the proof of Theorem 1 in Section IV-B particularized for d = 1, we obtain that 
H{q{X)) - R{D) > I log (r2’T(l + l/rYe) - lY.^r{X G S,.) log + 2r 

- Klog -f 2r(l/r)^ - K 




1 

6 ' 


( 111 ) 


Recall that = D/k. The last three terms on the RHS of (111) are independent of D and vanish as we first let 
K —7- 0 and then 6 ^ oo. To achieve 


1 -f 1/r 

a sequence of quantizers (parametrized by D) must therefore satisfy 

gsl;Eft..)(t't 

(As K —> 0, the term on the LHS of (112) becomes independent of d > 0.) For the sake of compactness, we shall 
use in the rest of the proof the following notation:^ 

Let V = 2’'(1 -f r). Further let u = 2r and recall that limK-j-o t> = 0 for every d > 0. Define 


A1 




_ A’’ 

= -jf>V + d 


(113a) 

(113b) 


^While all introduced quantities depend on k., to keep the notation compact we only make the dependence on D explicit. 
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and 


Id = E P'<-^ <5 

ieXn 


A 

Mn = 


Ar 


(1 - P.)<Z 




£) JGir 


1 

1 - Pe 


qd = E p>‘(^ e 


i^Xo 


T^d = ^ E P'‘(^ e 

(1 - Pe)<?D /V D 

i£Xd 


(114a) 

(114b) 


— A 

9n = 


1 


1 - P, 


^ Pr(X e S,,,), 


iez\(i^ui£)) 

where was defined in (58). Finally, define 


- 1 

jgz\(i^uiD) 


Ar 


^ Pr(XGS.,e)^ (114c) 


=gDtD^ + 9n M 


Id t^D- 


By definition of I and I, we have 


< V — d and /i£) > + 1?. 


Furthermore, by Lemma 6 and (59), 

lim lim ud ^V. 

K-tO D^O 

Consequently, for any arbitrary £ > 0, there exist kq and Dq such that 

f^D<V + e, {k<kq,D<Dq). 

Without loss of generality, we implicitly assume that k and D are sufficiently small, so that (118) holds. 
We next apply steps similar to (62) and (63) to upper-bound 


(115) 


(116) 


(117) 


(118) 


1 

1 - P. 


^Pr(X eS.,,)log 




< 9 


D 


log log(AiD + u’') + log -f 


for r < 1 


(119a) 


and 


T— EPr(^eS,.)log 

-L ye 


^i,e 

Dl/r 


< r 




log + v) +qD log (md’’ +'^) + go (^D 


for r > 1. 


(119b) 


(120a) 


jj + uj + log +'>^)+gD- 

It follows that, for r < 1, any sequence of quantizers satisfying (112) must also satisfy 

lim Hm log -f £>’') -f Qd log (Jld +'<j'')+gD log (Md + “ log (1^ -f w’') | > 0. 

Likewise, for r > 1, any sequence of quantizers satisfying (112) must also satisfy 

lim lim log +V^ +qij log +'^) + go log (Md+ 1') - log + w) | > 0. (120b) 

We conclude the proof of Theorem 7 for the case r > 1 by demonstrating that any sequence of quantizers 
satisfying (120b) must satisfy 

lim lim ( 7 „ = 1, for every -d > 0. (121) 

KiO DiO -D 
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Substituting p = 1/k, this can be written as 

lim lim -— ^ Pr(X e £, 1/^)1 

P^oo Dm 1 - a, ^ ’ 


b. 




D 


- 2 ’'(l + r) 


< t? > = 1 , for every 2 ? > 0 


( 122 ) 


which by Lemma 5 is equivalent to (69). The proof for r < 1 is almost identical and is therefore omitted. 

To prove (121) we use that, by the strict concavity of a; 1 —> log a;, there exists a linear function x 1 —>■ txoix) such 
that 

log(a; + i;) < 4o(a;). a; > 0 (123) 


with equality if, and only if, x = Xq. (Specifically, ixgi^) = x^+v ^ 08 ( 2^0 + f) — 1.) Moreover, we have 


log 


(124) 


since xq > log(a:o + t>) — E[4o(-^)] (for any discrete random variable X) is monotonically increasing in xq 
and nonnegative for xq > E[2f], and since q^p]^^ + cIdVd)^ + ^HS of (120b) can thus be 

upper-bounded by 




log [p\ 


jr 

HD 


+ U - 


qD iog(: 


V Md’’ + V , 


-l/r 

[f^D 




. 1/’’ I 

\Pl^ +V^ 


-^D 


log ( 




(eL) - log 


yl/r 


l/r 

.t^D +'P’. 


By (117) and (123), the third term in (125) satisfies 


lim lim q 


D^O 




log ( 






(eL) - log 


V^/’' -bu' 

l/r 

+^. 


< 0 . 


We further have 


log(EL + o)-g„r-(aL)-*“« 


yl/r 


l/r 

.Md +^1, 


< log ((L - t?)!/’- + n) - f(vo+e)i/r ((L - I?)!/’-) + log 


(V + ey/^ + v 

V^!'- -b V 


^ K 


(125) 


(126) 


(127) 


and 


log (md+ u) - (md’’) - log ( 


yl/r 


V y^D + ri, 


< log ((y + + v)- l^v+eym {(V + 1 ?)'/") + log 


4k. 


nv + £y/^ + v\ 

V yi/’’ + ti ) 


(128) 


Here, we used (116) and (118) together with the facts that x 1 —>■ 4o(ai) ~ log(a; + v) is monotonically decreasing 
for X < Xq and monotonically increasing for x > xq, and xq i—?■ log(a:o + 22 ) — bxo{x) is monotonically increasing. 
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Combining (125)-(128), it follows that (120b) can only be satisfied if 

limlimmaxjK, K} fg > 0. (129) 

Since for £ > 0 sufficiently small, we have 

limmax |K, K| < 0 (130) 

the condition (129), in turn, can only be satisfied if 

limlim (g + = 0. (131) 

Using that g^ = 1 — g^ — g^, the claim (121) follows. This concludes the proof of Theorem 7. 
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